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Abstract 

A central limit theorem for binary tree is numerically examined. 
Two types of central limit theorem for higher-order branches are for- 
mulated. A topological structure of a binary tree is expressed by a 
binary sequence, and the Horton-Strahler indices are calculated by us- 
ing the sequence. By fitting the Gaussian distribution function to our 
numerical data, the values of variances are determined and written in 
simple forms. 

1 Introduction 

Branching patterns are widely spread in the nature [1,2]. Some patterns 
appear to be quite similar to each other even if their generation process 
are different. The branching patterns are characterized from various stand- 
points. For example, a property related to spatial configurations is called 
geometric, including length, spatial symmetry, and fractality. On the other 
hand, a property based on graph-theoretic structure (and not on spatial ex- 
tent) is called topological. Connectivity and degree distributions of complex 
networks are typical and important topological structures. In particular, the 
topological structure of a branching pattern can be expressed by a binary- 
tree graph. 

A full binary tree is a tree graph (i.e., a connected graph without loops) 
where every node has exactly zero or two 'children' (see Fig. [T]for reference). 
For simplicity, we use the term 'binary tree' instead of 'full binary tree' 
hereafter, since we focus on only full binary trees throughout the paper. A 
node without any children is called leaf, the node without 'parents' is called 
root, and the number of leaves is called magnitude. Binary trees have been 
mainly investigated in computer science, and frequently used in order to 
represent some types of data structures such as binary search tree, binary 
heap, and expression tree [3,4]. 

In order to derive topological characteristics of branching patterns, a 
method of branch ordering has been introduced by Horton [5] and Strahler 




Figure 1: An example of a binary tree of magnitude 6. The numbers on 
the nodes represent the Horton-Strahler indices. 



[6]. With this method, ramification complexity and a hierarchical structure 
of branching patterns can be measured. For each node f in a binary tree T, 
the Horton-Strahler index S{v) is defined recursively as 



Siv) 



1, if is a leaf, 

max{S{vi), S{v2)} + (^5(^1) 5(112)) if '^i ^^'^ '^2 are the children of u, 

(1) 

where (5j j- is the Kronecker delta. We define a branch of order r as a maximal 
path connecting nodes of order r. The ratio of the number of branches 
of two subsequent orders is called the bifurcation ratio, and it has been 
found in many branching patterns that the bifurcation ratio takes almost 
constant value for different orders, which is known as "Morton's law of stream 
numbers" especially in river networks [5]. Horton-Strahler analysis has been 
applied to a wide range of branching patterns [7-15]. 

A simple model called random model or equiprobable model, formulated 
by Shreve [16], is a finite probability space where 0„ denotes the 

sample space consisting of topologically distinct binary trees of magnitude 
n, and P„ is the uniform probability measure on O^. We also introduce 
a random variable Sr,n : — > N U {0} such that Sr,niT) represents the 
number of branches of order r in a binary tree T £ Horton's law on 
{Qn, Pn) is stated in the form 

E{Sr,n) 1 

■ - as n ^ 00, (2j 



E{Sr-l,n) 4 

where £"(•) denotes the average on Pn), and r = 2, 3, • • • . Analytical or 
combinatorial properties of Sr^n are discussed in [17-23] for example. 
Wang and Waymire analytically proved the central limit theorem 
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where "=^" denotes convergence in distribution, and N{fi, cr^) denotes Gaus- 
sian distribution with mean fi and variance a"^ [24]. Eq. ([3]) is equivalently 
expressed as 



In the same way as Eq. ([2]), we expect the fohowing relations 
And, Eq. ^ is considered to be naturahy generahzed to 

^(sfe'i) ^ ^^^^ 
— -4;^1 ^ m^.^), (4b) 



n 



where o"^ and ct^ are variances depending on the order r. However, the 
proof of Eqs. ^ has not been performed analytically or numerically so 
far, and the values of ar and ct^ have not been obtained for r > 3. In the 
present paper, we propose a method of calculating Horton-Strahler indices 
of a binary tree by using binary sequence, and show numerical evidence for 
the validity of Eqs. 



2 Correspondence between Binary Trees and Dyck 
Paths 

A Dyck path of length 2(n — 1) is a sequence of points (sq, • • • , S2(n-i)) 
on a two-dimensional lattice from sq = (0,0) to S2(n-i) = (n — l,n — 1) 
such that each point Si = {xi,yi) satisfies Xi > yi and each elementary step 
(si,Sj+i) is either rightward or upward (see Fig. [2]). 

For each Dyck path, a binary sequence of length 2(n— 1) is generated by 
replacing a rightward step with '1' and an upward step with '0'. The binary 
sequences generated by this replacement are formally called Dyck words 
on the alphabet {1,0} [25], and for simplicity we call them Dyck sequences 
throughout the paper. Clearly, Dyck sequences share the two properties: (i) 
the total number of '0' (and also '1') is n — 1, (ii) cumulative number of '0' 
is never greater than that of '1'. 

A correspondence between the Dyck paths of length 2(n — 1) and the 
binary trees of magnitude n is explained as follows (see Fig. [3] for reference) . 
(i) Start with a Dyck path of length 2(n — 1). (ii) Draw diagonal lines from 
upper right to lower left which are never below the Dyck path, (iii) Extract 
only the diagonals and the vertical lines in the Dyck path. It is found that 
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(8,8) 




(0,0) 

Figure 2: An example of a Dyck path of length 16. Dashed lines indicate 
grid lines of Z^. All the Dyck paths lie below the diagonal line 

the pattern obtained from this process is topologically the same as a binary 
tree of magnitude n, shown in Fig. [3](b). Note that each Dyck path has one- 
to-one correspondence to a binary tree. Therefore, a Dyck path possesses 
the same topological structure as the corresponding binary tree. 

The above method can be reformulated in a different way, where a Dyck 
sequence is generated from a binary tree. Here, a binary tree is regarded 
as a graph representing a successive merging process of two adjacent nodes, 
and each merging is expressed by putting two nodes in parentheses '( )'. 
Thus, the topological structure of a binary tree T G r2„ is fully expressed by 
a sequence of the leaves vi, - ■ ■ , u„ of T and n — 1 pairs of ' ( ) ' [an example is 
shown as step ( i) in Fig. . A correspondence between a binary tree T G 
and a Dyck sequence of length 2(n — 1) consists of the following two steps, (i) 
Convert T into a sequence of ui, • • • ,Vn and '( )'. (ii) Eliminate 'ui' and '(', 
and replace f2, • • • ,Vn with '1' and ')' with '0.' A generated binary sequence 
proves to be a Dyck sequence and the correspondence is one-to-one. Fig. 
U] illustrates this correspondence. Note that this process is similar to an 
expression tree and reverse Polish notation in formula manipulation [26]. 

The Horton-Strahler indices of a binary tree can be calculated through 
the corresponding Dyck sequence. The method consists of the following two 
steps: (i) Add '1' to the top of the Dyck sequence, (ii) Replace a segment 
'm n 0' (m, n > 0) with a single number 'max{m, n} + 6m,n^ recursively until 
the length of a sequence becomes 1. It is found that the number of times of 
a transformation '(r — 1) (r — 1) 0' — > 'r' is identical with Sr^niT) for r > 2. 
Note that the operation (ii) is similar to Eq. ([1]) as shown in Fig. [5j 
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(i) (ii) (iii) 

(a) 




(b) 

Figure 3: (a) An illustration of how to get a binary tree from a Dyck path, 
(i) The initial Dyck path of length 16. (ii) The Dyck path with diagonals 
from upper right to lower left, (iii) The diagonals and vertical steps. The 
structure of a binary tree can be seen, (b) The binary tree corresponding 
to (a-iii). 




( (^^1 ^^2 ) ( ( (^^3 ^^4 ) ^^5 ) ( (^^6 ^^7 ) (^^8 ^^9 ) ) ) ) 




1011010110110000 Dyck sequence 

Figure 4: An illustration of correspondence between a binary tree of mag- 
nitude 9 and a Dyck sequence of length 16. In the step (i), a binary tree is 
converted into a sequence consisting of vi, • • • ,^9 and '( )'. In the step (ii), 
a Dyck sequence is generated by the rule of replacement. 
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Figure 5: Similarity between a structure of the Horton-Strahler indices and 
corresponding calculation process. 



3 Generation of Random Dyck Paths 

A basic method for generation of random Dyck paths is summarized 
in [27]. In this section, we present a method in a little different manner 
from [27]. We also propose a graphical representation for the generation 
process. 

Let V denote the set of points in where at least one Dyck path 
passes, that is, V = {{x,y) ^ I? \ ^ < x,y < n — 1, x > y}. We assign 
'transition probabilities' P^{x,y) and P-^{x,y) on each point {x,y) G V. 
Each elementary step (sj, Sj+i) of a Dyck path (si, • • • , S2(n_i)) is selected 
stochastically: stepping rightward with a probability P_^(sj) and upward 
with A set of transition probabilities yields a generation probability 

of a Dyck path (sq, • • • , S2(„_i)), which is given by 



2(n-l)-l 

^'(■so, • • • , •S2(n-i)) = n ^here Pi 

i=0 



P^{si), if (si,Sj+i) is rightward, 
Pf(si), if (si,Sj+i) is upward. 



Since we focus on the random binary-tree model, we need to determine the 
transition probabilities where every Dyck path is generated equiprobably. 

We define a monotonic path from (x, y) G P as a sequence of points 
on V from {x^y) to (n — l,n — 1) where each elementary step is either 
rightward or upward. Clearly, the length of a monotonic path from (z, y) 
is 2(n — 1) — {x + y), and a monotonic path from (0, 0) is identical with a 
Dyck path. The total number N{x, y) of the monotonic paths from (x, y) is 
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written as 

'2(n- 1) - (x + y)\ /2(n- 1) - (x + y) 

n — X — 1 J \ n — X — 2 
{2(n - 1) - {x + y)}\ 



N{x,y) 



(n — 1 — x)\{n — y)\ 



(x-y + l). (5) 



For the calculation of Eq. ([5]) , we employed the reflection principle familiar 
in random- walk theory [28]. 

There are several remarks on N(x,y): 

1. For any {x,y) £ V, N{x,y) is positive. 

2. N{n — l,y) = 1, when y = 0, - ■ ■ , n — 1. 

3. If {x,y) is on the diagonal [i.e., {x,y) = {k,k)], then N{k,k) = 
(J'.^il^ji^n-fc)! ' "^hich is known as the (n — A; — l)th Catalan number [29]. 

4. The number of Dyck paths [which can be expressed as A^(0, 0)] is given 
by the (n — l)th Catalan number. This is well-known result, going back 
to Cayley [30]. 

5. N{x,y) = N{x -I- l,y) -I- N{x,y + 1) for all {x,y) € V, where we set 
N{x,y) = if {x,y) ^V. 

On each point (x, y) G P, we define transition probabilities P^{x, y) and 
y) as 

P (^y) = Nj^ + ^'V) = in-l-x)ix-y + 2) 

'^^ N{x,y) (l + x-y){2(n-l)-(x + y)}' ^^^^ 

N[x,y) [1 + X - y){2{n - 1) - [x + y)} 

Specifically, + = 1, P|(A:,A;) = and P^{n - l,y) = 0. It is also 
proved inductively that Eqs. ^ realize random generation of Dyck paths. 

Next, we propose a graphical representation of random Dyck paths. The 
number N{x, y) can be calculated graphically as follows: 

(i) Set N{n—1, y) = 1 for all the rightmost points (n— 1, y) (y = 0, - ■ ■ , n— 
1) of P. This implies that there is only one monotonic path from 
(n — 1, y), which is composed only of upward steps. 

(ii) For convenience, let N(x, y) = for all (x, y) D. 

(iii) N{x,y) is calculated from N{x,y) = N{x + l,y) + N{x,y + 1), that 
is, A^(x, y) is given by the sum of the value N on the right and upper 
adjacent points [thus, N{x,y) is calculated from right to left, top to 
bottom]. This implies that the monotonic paths from (x,y) consist of 
ones passing through (x + 1, y) and (x, y + 1). 
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Figure 6: An example of the graphical representation of generation proba- 
bility {n = 5). The dashed lines indicate the grid line of T). Each number 
near a lattice point indicates N{x^ y). Prom successive canceling, we can see 
that all Dyck paths are generated with the same probability. 



Note that N{x,y) determined from (i)-(iii) is identical with Eq. ([5]). The 
graphical representation and examples of generation probability is depicted 
in Fig. m We can roughly confirm the uniformity of generated Dyck paths 
through successive canceling. 



4 Numerical Procedure 

The Gaussian distribution function with mean and variance o"^ is writ- 
ten as 

where erf(x) is the error function defined as 

2 2 
erf(x) = / e~* dt. 
V^r Jo 

Thus, the central limit theorems (I4ap and ()4b|) are respectively rewritten as 



Pr, 



1 



n 4' 



r-l 



< X 



A numerical algorithm for the calculation of Gr and dr is summarized as 
follows: 

(i) Generate Dyck sequences of length 2(n — 1) randomly, on the basis of 
the method in Sec. [3l 
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(ii) Calculate Horton-Strahler indices of the Dyck sequences. 

(iii) Compute values of both yjn {^-^^ ^ and \/n — -^^^ for 

r = 2,3,---. 

(iv) Make distribution functions from the values, then determine the values 
of Or and dr by fitting Eq. ([7]) to the distribution functions. 



5 Results of the Central Limit Theorem 



Fig. [7]shows distribution functions of \Jn 5 j and \Jn 

generated from 10^ samples with n = 10000. The stepwise increases appear 
in the cases of r = 6 and 7 in Fig. [71 (a), because the denominator Sr-\,n of 



1 

41-- 1 



a fraction 



Sr.i 



is decreasing with respect to r. 



By fitting of the distribution function ([7]) to each data set in Fig. [71 we 
obtain Table [H and Fig. [HI which suggest the relations 

r,r-4 



Or 



1 



(9a) 
(9b) 



Eq. (|9a|) is good agreement with our numerical results. Eq. ([9b]) also seems 
to be consistent with our results, although there are errors of about a few 
percent (< 4%) between r and — log2CTr. 




10 15 




Figure 7: Distribution functions of (a) y/n y g'^''" and (b) 

^ — TT^T ) with n = 10000, r = 2 — 7, generated from 10^ samples. 



In conclusion, the two central limit theorems are stated as 

Sr-^n 1 
,Sr-l,n 4 
Sr,n 1 



n 



4r-l 



iV (0,4^-4). 
A^(0,4-^') . 



(10a) 
(10b) 



9 



Table 1: Values of ar and dr obtained by fitting. 



r 




2r-4 




2-r 


- log2 


2 


0.2492 


0.25 


0.2502 


0.25 


1.999 


3 


0.4999 


0.5 


0.1398 


0.125 


2.839 


4 


0.9968 


1 


0.07165 


0.0625 


3.803 


5 


2.0001 


2 


0.03605 


0.03125 


4.794 


6 


4.0250 


4 


0.01798 


0.015625 


5.797 




Figure 8: r-dependence of ar and dr- The solid line indicates 2*" ^ and the 
dashed line indicates 2~^. 



Note that both Eqs. (llOp are reduced to Eq. ([3]) when r = 2. 



6 Discussion 



The Horton-Strahler index is based on 'merging' or 'joining' of branches 
in a binary tree, and a Dyck sequence generated from the method in Sec. 
[2] preserves a merging structure of the initial binary tree. Thus, the corre- 
spondence presented in this paper is suitable for the calculation of Horton- 
Strahler indices. It is known that there are some other ways of one-to-one 
correspondence between Dyck paths and binary trees [29,31,32]. However, 
Dyck paths generated from such other methods are not directly connected 
to the Horton-Strahler indices. 

Our method can supply various numerical calculations based on the ran- 
dom binary-tree model, not only the central limit theorems. For example, 
see Fig. O our method is able to reproduce an asymptotic expansion of the 
bifurcation ratio 



E{Sr,n) _ ^ 4' ^ /~,^„-2 



+ 0{n-^) r > 1, (11) 



E{Sr+l,n) 2n 

quite well, which has been obtained analytically by Moon [33]. Moreover, 



10 




Figure 9: Comparison between analytical and numerical results of bifur- 
cation ratios. Points denote numerical result, and lines denote asymptotic 
forms 4— 1^ for r = 1, 2, 3, 4. Numerical data are generated from 10^ samples 
for each n at intervals of 100. 

Generation of random Dyck paths can be regarded as a Markov pro- 
cess on D, which is called the Bernoulli excursion [34]. In addition, with 
taking a certain scaling limit, the Bernoulli excursion converges weakly to 
a diffusion process called the Brownian excursion [35], which is defined as 
one-dimensional Brownian motion {B{t) : < t < 1} such that P{B{0) = 
0) = P(5(l) = 0) = 1 and P{B{t) > 0) = 1 for < i < 1. We expect that 
some asymptotic properties of the random binary-tree model are derived 
from the corresponding scaling limit. 

Furthermore, the number N{x, y) given by Eq. ([5]) is an example of 
the Kostka number, appearing in some combinatorial problems [36,37]. It 
is expected that such other systems are related to a generation of random 
Dyck paths. 



7 Conclusion 

In the present paper, we propose a numerical method of generating ran- 
dom binary trees in the form of Dyck sequences. We also propose a method 
of calculating the Horton-Strahler indices from Dyck sequences. From nu- 
merical results, we confirm that the variances Gr and dr are determined as 
Eqs. ([9]). Therefore, validity of the central limit theorems (llOj) are suggested 
numerically. 
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